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Abstract. We consider the problem of preprocessing A'^ points in 2D, 
each endowed with a priority, to answer the following queries: given 
a axis-parallel rectangle, determine the point with the largest prior- 
ity in the rectangle. Using the ideas of the effective entropy of range 
maxima queries and succinct indices for range maxima queries, we ob- 
tain a structure that uses 0{N) words and answers the above query in 
0(lg A' Ig Ig A'^) time. This is a direct improvement of Chazelle's result 
from 1985 [11] for this problem - Chazelle required 0{N/e) words to 
answer queries in 0((lg A)^"'"'^) time for any constant e > 0. 

1 Introduction 

Range searching is one of the most fimdamcntal problems in computer science 
with important applications in areas such as computational geometry, databases 
and string processing. The input is a set of N points in general position in M** 
(wc focus on the case d = 2), where each point is associated with satellite data, 
and an aggregation function defined on the satellite data. We wish to preprocess 
the input to answer queries of the following form efficiently: given any 2D axis- 
aligned rectangle R, return the value of the aggregation function on the satellite 
data of all points in R. Researchers have considered range searching with respect 
to diverse aggregation functions such as emptiness checking, counting, reporting, 
minimum/maximum, etc. [18]. In this paper, wc consider the problem of range 
maximum searching (the minimum variant is symmetric), where the satellite 
data associated with each point is a numerical priority, and the aggregation 
function is "arg max", i.e., we want to report the point with the maximum 
priority in the given query rectangle. This aggregation function is the canonical 
one to study, among those of the "commutative semi-group" class [13, 11]. 

Our primary concern is the space requirement of the data structure — we 
aim for linear-space data structures, namely those that occupy 0{N) words — 
and seek to minimize query time subject to this constraint. The space usage is 
a fundamental concern in geometric data structures due to very large data vol- 
umes; indeed, space usage is a main reason why range searching data structures 
like quadtrees, which have poor worst-case query performance, are preferred in 
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many practical applications over data structures such as range trees, which have 
asymptotically optimal query performance. Space efficient solutions to range 
searching date to the work of Chazelle [11] over a quarter century ago, and 
Nekrich [19] gives a nice survey of much of this work. Recently there has been 
a flurry of activity on various aspects of space-efficient range reporting, and for 
some aggregation functions there has even been attention given to the constant 
term within the space usage [5, 19]. 

We now formalize the problem studied by our paper, as well as those of [11, 9, 
16]. We assume input points are in rank space: the x-coordinates of the n points 
are {0, . . . , — 1} = [N], and the y-coordinates are given by a permutation 
■u : [A^] ^ [N], such that the points are {i,v{i)) for i = 0,...,N - 1. The 
priorities of the points are given by another permutation tt such that 7r(i) is the 
priority of the point {i,v{i)). The reduction to rank space can be performed in 
0(lg N) time with a linear space structure even if the original and query points 
are points in [13,11]. The query rectangle is specifed by two points from 
[A^] X [N] and includes the boundaries (see Fig. 1(R)). Analogous to previous 
work, we also assume the word- RAM model with word size (IgN) bits*. 




Fig. 1. 2-sided and 4-sided range maximum queries. The numbers with the points 
represent their priorities, and the unshaded points are the answers. 

Range maximum searching is a well-studied problem (see Table 1). Chazelle [11] 
gave a few space/time tradeoffs covering a broad spectrum. To the best of our 
knowledge, the solution with the lowest query time that uses only 0{N) words 
is still that of Chazelle [11], who gave a query time that is polylogarithmic in 
N. More precisely, he gave a data structure of size 0{^N) words with query 
time 0(lg^~''*^ A^) for any fixed e > 0. Other recent results on the range maximum 
problem are as follows. Karpinski et al. [16] studied the problem of 3D five-sided 
range emptiness queries which is closely related to range maximum searching in 
2D. As observed in [9], their solution yields a query time of (Iglg A^)'^^^^ with an 
index of size N{\g\gN)'-'^^^ words. Chan et al. [9] currently give the best query 
time of O (Ig Ig A^), but this is at the expense of using O (N Ig*^ A^) words, for any 
fixed e > 0. However, there has been no improvement in the running time for 
linear-space data structures. In this paper, we improve Chazelle's long-standing 
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Citation 


Size (in words) 
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O {N Ig' N) 


O(lgiV) 
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O(lgiVlglgAr) 



Table 1. Space/time tradeoffs for 2D range maximum searching in the word RAM. 



result by giving a data structure of 0{N) words and reducing the query time from 
polylogarithmic to "almost" logarithmic, namely, O (Ig Ig Ig A^) . Although our 
primary focus is on 4-sided queries, which specify a rectangle that is bounded 
from all sides, we also need to consider 2-sided and 3-sided queries, which are 
"open" on two and one side respectively (thus a 2-sidcd query is specified by a 
single point — see Fig. 1(L) — and a 3-sided query by two points (i,j) and 
{k,l) where either i = k ov j = I). Our solution recursively divides the points 
into horizontal and vertical slabs, and a query rectangle is decomposed into 
smaller 2-sided, 3-sided, and 4-sided queries. A key intermediate result is the 
data structure for 2-sided queries. The 2-sided sub-problems are partitioned into 
smaller sub-problems, which arc stored in a "compressed" format that is then 
"decompressed" on demand. The "compression" uses the idea that to answer 
2-sided range maxima queries on a problem of size m, one need not store the 
entire total order of priorities using (mlgm) bits: O (m) bits suffice, i.e., the 
effective entropy [14] of 2-sided queries is low. This does not help immediately, 
since (m Ig m) bits are needed to store the coordinates of the points compris- 
ing these sub-problems: to overcome this bottleneck we use ideas from succinct 
indices [2] — namely, we separate the storage of point coordinates from the data 
structure for answering range-maximum queries. The latter data structure does 
not store point coordinates but instead obtains them as needed from a global 
data structure. 

We solve 3-sided and 4-sided subqueries by recursion, or by using structures 

for range maximiim queries on matrices [22, 6]. When rccursing, we cannot afford 
the space required to store the structures for rank space reduction for each 
such subproblem: a further key idea is to use a single global structure to map 
references to points within recursive subproblems back to the original points. 

By reusing ideas from the data structure for 2-sided queries, we obtain two 
stand-alone results on succinct indices for 2-sided range maxima queries. In a 
succinct index, we wish to answer queries on some data, and it is assumed that 
the succinct index is not charged for the space required to store the data itself, 
but only for any additional space it requires to answer the queries. However, the 
succinct index can access the data only through a specific interface (see [2] for a 
discussion of the advantages of succinct indices). In our case, given N points in 
rank space, together with priorities, we wish to answer 2-sided range-maximum 



queries under the condition that the point coordinates are stored "elsewhere" — 
the data structure is not "charged" for the space needed to store the points 
"elsewhere" — and are assumed to be available to the data structure in one of 
two ways: 

— Through an orthogonal range reporting query. Here, we assume that a query 
that results in k points being reported takes T{N, k) time. We assume that 
T is such that T{N, 0{k)) = 0{T{N, k)). 

— As in Bose et al. [4], we assume that the point coordinates are stored in 
read-only memory, permitting random access to the coordinates of the i-th 
point. However, the ordering of the points is specified by the data structure, 
which we call the permuted-point model. ^ 

In both cases, we are able to achieve 0(A'')-bit indices with fast query time, 
namely 0(lglgiV- {IgN + T(NAgN))) time, and 0{lglgN) time, respectively. 

The paper is organized as follows. We first describe some building blocks used 
in Section 2. Section 3 is devoted to our main result, and Section 4 describes the 
succinct index results. 

2 Preliminaries 

Our result builds upon a number of relatively new results, all of which are 
essential in one way or the other to obtain the final bound. In order to support 

mapping between recursive sub-problems, we use the following primitives on a 
set S of N points in rank space. A range counting query reports the number of 
points within a query rectangle: 

Lemma 1 ([15]). Given a set of N points in rank space in two dimensions, 
there is a data structure with 0{N) words of space that supports range counting 
queries in O (Ig N/ Ig Ig N) time. 

A range reporting structure supports the operation of listing the coordinates of 
all points within a query rectangle. We use the following consequence of a result 

of Chan et al. [9]: 

Lemma 2 ([9]). Given a set of N points in rank space in two dimensions, 
there is a data structure with 0{N) words of space that supports range reporting 

queries m 0((l + fc)lgi/3 7v) time where k is the number of points reported. 

The range selection problem is as follows: given an input array A of size N, to 
preprocess it so that given a query {i,j,k), with 1 < i < j < A^, we return 
an index ii such that A[ii] is the fc-th smallest of the elements in the subarray 
A\i],A[i + l],...,A[j]. 

Lemma 3 ([7]). Given an array of size N, there is a data structure with 0{N) 
words of space that supports range selection queries in 0{lgN/\g\gN) time. 

Clearly, a succinct index for this problem is interesting only if it uses o{N) words 
= o{N Ig A'^) bits of space, so if the points are stored in arbitrary order in read-only 
memory, the succinct index cannot itself store the permutation that re-orders the 
points. 



3 The data structure 



In this section we show our main result: 

Theorem 1. Given N points in two-dimensional rank space, and their prior- 
ities, there is a data structure that occupies 0{N) words of space and answers 
range maximum queries in 0{lgNlg\gN) time. 

We first give an overview of the data structure. Wc begin by storing all the 
points (using their input coordinates) once each in the structures of Lemmas 1 
and 2. We also store an instance of the data structure of Lemma 3 once each 
for the arrays X and Y, where X[i] = u{i) and Y[i] = i'~^{i) for i £ N {X 
stores the ^/-coordinates of the points in order of increasing x-coordinate, and Y 
the a;-coordinates in order of increasing y-coordinate). These four "global" data 
structures use 0{N) words of space in all. 

We recursively decompose the problem a la Afshani et al. [1]. Let n be the 
recursive problem size (initially n = N). Given a problem of size n, we divide 
the problem into n/k mutually disjoint horizontal slabs of size n by fc, and n/k 
mutually disjoint vertical slabs of size fc by n. A horizontal and vertical slab 
intersect in a square of size kx k. We recurse on each horizontal or vertical slab: 
observe that each horizontal or vertical slab has exactly k points in it, and is 
treated as a problem of size k — i.e. it is logically comprised of two permutations 
V and TT on [k] (Fig. 2(L); Sec. 3.1). Clearly, given a slab in a problem of size n 
containing k points, wc require some kind of mapping between the coordinates 
in the slab (which in one dimension will be from [n]) and the recursive problem 
in order to view the slab as a problem of size k. A key to the space-efRciency of 
our data structure is that this mapping is not explicitly stored, and is achieved 
through a slab-rank operation. The slab-select problem is a generalization of the 
inverse of the slab-rank problem (Sec. 3.2). 

The given query rectangle is decomposed into a number of disjoint 2-sided, 

3- sided and 4-sided queries, based upon the decomposition of the input into 
slabs. There are three kinds of terminal queries which do not generate further 
recursive problems: all 2-sided queries are terminal, 4-sided queries whose sides 
are slab boundaries are terminal, and all queries that reach slabs at the bottom 
of the recursion are terminal. The problems (or data structures) that involve 
answering terminal queries (or are used to answer terminal queries) are also 
termed terminal. Each terminal query produces some candidate points: the set 
of all candidate points must contain the final answer. A key invariant needed to 
achieve the space bound is that all terminal problems of size n — except those at 
the bottom of the recursion — use space o(nlgn) bits (Sec. 3.1). 

Clearly, since storing the input permutation representing the point sets in 
terminal problems takes 0{nlgn) bits, we do not store them explicitly. Instead, 
the terminal data structures are succinct indices — the points that comprise them 
are accessed by means of queries to a single global data structure. The terminal 

4- sided problems use an index due to [6], while the 2-sided problems reduce the 
range maximum query to planar point location. Although there is a succinct 
index for planar point location [4], this essentially assumes that the points that 



comprise the planar subdivision are stored explicitly ( "elsewhere" ) and permuted 
in a manner specified by their data structure. In our case, if the points are repre- 
sented explictly, they would occupy 0{NlgN/ \glgN) words across all recursive 
problems. A key to our approach is an implicit representation of the planar sub- 
division in the recursive problems, relevant parts of which are recomputed from 
a "compressed" representation at query time (Sec. 3.4); the other key step is to 
note that 0{n) bits suffice to encode the priority information needed to answer 
2-sided queries in a problem of size n (Sec. 3.3). 

3.1 A recursive formulation and its space usage 

The recursive structure is as follows. Let L = IgN, and consider a recursive 
problem of size n (at the top level n = N). We assume that N is & power of 2, 
as are a number of expressions which represent the size of recursive problems. 
This can readily be achieved by replacing real-valued parameters a: > 1 by 
2Lig^l or 2r^sccl without affecting the asymptotic complexity. Unless we have 
reached the bottom of the recursion, we partition the input range [n] x [n] into 
mutually disjoint vertical slabs of width k = \fnL and also into mutually disjoint 
horizontal slabs of height k = \fnL - each such slab contains k points and can 
be logically viewed as a recursive problem of size k (see Fig. 2(L)). Observe that 
the input is divided into {n/k)"^ = n/L squares of size k x k, each representing 
the intersection of a horizontal slab with a vertical one. We need to answer either 
2-sided, 3-sided or 4-sided queries on this problem. 

The data structures associated with the current recursive problem are: 

— for problems at the bottom of the recursion, we store an instance of Chazelle's 
data structure which uses O (n Ig n) bits of space and has query time O ((Ig n)^) 

— For 2-sidcd queries (which are terminal) in non-terminal problems we use 
the data structure with space usage 0{ny/lg n) bits described in Sec. 3.4. 

— 3- and 4-sided queries, all of whose sides are slab boundaries, are square- 
aligned: they exactly cover a rectangular sub-array of squares. For such 
queries, we use a 0(n)-bit data structure comprising. 

• A n/k X n/k matrix containing the (top-level) x and y coordinates and 
priority of the maximum point (if any) in each square. This uses 0{n/L- 
L) = 0(n) bits. 

• The data structure of [22,6] for answering 2D range maximum queries 
on the elements in the above matrix. This also uses 0(n) bits. 

Finally, each recursive problem has O(lgA^) = 0(L) bits of "header" informa- 
tion, containing, e.g., the bounding box of the problem in the top-level coordinate 
system. Ignoring the header information, the space usage is given by: 



S(n) = 2yQLS{^/nL) + 0{n^/ign), 



which after r levels of recursion becomes: 




The recursion is terminated for the first level r where 2^ > \gN/\g\gN. At this 
level, the problems are of size O ((IgiV)^) and [2 ({IgN)^-^) and the second term 
in the space usage becomes 0{N\gN) bits. Applying S{n) — 0{n\gn) for the 
base case, we see that the first term is 0((lg N/ Ig Ig iV) • • Ig Ig N) = 0{N Ig A^) 
bits, and the space used by the header information is indeed negligible. 




Fig. 2. The recursive decomposition of the input (L) and queries (R). In (R), shaded 
problems are terminal problems. The 4-sided query is decomposed into a square-aligned 
4-sided query in the middle and four recursive 3-sided queries, two in horizontal slabs 
and two in vertical slabs. The 3-sided query is decomposed into two 2-sided queries 
in vertical slabs, a square-aligned 3-sided query and one recursive 3-sided query in a 
horizontal slab. 



We now discuss the time complexity. In general, we have to answer either 
2-sided, 3-sided or 4-sided queries on a slab. Note that (see Fig. 2(R)): 

— A 2-sided query is terminal and generates one candidate. 

— A 3-sided query results in at most one recursive 3-sided query on a slab 
(generating no candidates) at most two 2-sided queries on slabs, and at 
most one square-aligned 3-sided query (generating one candidate). 

— A 4-sided query either results in a recursive 4-sided query on a slab (gener- 
ating no candidates) or generates at most one square-aligned 4-sided query 
(generating one candidate), plus up to four 3-sided queries in slabs. 

Since each 3-sided query only generates one recursive 3-sided query, the number 
of recursive problems solved, and hence the number of candidates, is 0{r) = 
0{\g Ig N) . The time complexity will depend on the cost of mapping query points 
and candidates between the problems and their recursive sub-problems and the 
cost of solving the terminal problems; the former is discussed next. 



3.2 The slab- rank and slab-select problems 

The input to each recursive problem of size n is given in local coordinates (i.e. 
from [n] x [n]). Upon decomposing the query to this problem, we need to solve 
the following slab-rank problem (with a symmetric variant for vertical slabs): 



Given a point p = in top-level coordinates, which is mapped 

to in a recursive problem of size n, such that that lies in a 

horizontal slab of size nxk, map to the appropriate position 
in the size k problem represented by this slab. 

We formalize the "inverse" slab-select problem as follows: 

Given a rectangle R in the coordinate system of a recursive problem, 
return the top-level coordinates of all points that lie within R. 

The following lemma assumes and builds upon the four "global" data structures 
mentioned after the statement to Theorem 1. 

Lemma 4. The slab-rank problem can be solved in 0{lgN/lglgN) time, and 
the slab-select problem in 0{lgN/lglgN) time as well, provided that R contains 
at most O {^/\g N) points. 

Proof. We first consider the slab-rank problem. Without loss of generality, as- 
sume that the given (i, j) is in a horizontal n x k slab. To translate j to j', we 
only need to subtract the appropriate multiple of k in 0(1) time. To map i to i', 
we need to count the number of points in the slab with x-coordinate smaller than 
i (see Fig. 3(L)). Since the top-level coordinates of the point (i,j) are known, 
and the top-level coordinates of the slab arc also stored in the "header" of the 
slab by assumption, top-level coordinates of all sides of the query are known. As 
all input points are stored in a global instance of the data structure of Lemma 1, 
counting the number of points can be performed by an orthogonal range counting 
query in 0(lgA^/lglgiV) time. 




The slab-select problem is solved in a similar manner, and is most easily 
explained with reference to Fig. 3(R). We are given a recursive sub-problem 
P and a rectangle R within P, and we know P's bounding box in the top- 
level problem (as shown on the far right). Our aim is to retrieve the top-level 
coordinates of the image of R in the top-level problem. Suppose that the local 



X and y coordinates of R are Xi, Xr, yb and yt- Then we perform a orthogonal 
range count (Lemma 1) in the area A, which lies under P but within P's x- 
coordinates. This takes 0(lg N/ Ig Ig N) time, and let z be the value returned. We 
then select the z + y^ and z+yt-th smallest y-coordinates within X[xi],. . . ,X[xr] 
by Lemma 3, also taking 0(lg A^/lglg7V) time. The (top-level) y-coordinates of 
the points returned (shown shaded in the middle) are i?'s boundaries in the 
top-level coordinate system. A similar query on Y[yb], . . . , Y[yt] gives the other 
boundaries of R in the top-level coordinate system. The O (-^/Ig points in this 
rectangle are then retrieved in O ((Ig N)^/^) = o(lg N/ Ig Ig A'') time by Lemma 2. 

□ 

Remark 1. In all applications of the slab-rank result, the top-level coordinates 
of will be known, as (i, j) will either be a vertex of the original input query, 
or is the result of intersecting a horizontal (or vertical) line through the vertex 
of a recursive problem (whose top-level coordinates are inductively known) with 
a vertical (or horizontal) line defining a recursive sub-problem. 

3.3 Encoding 2-sided queries 

In this section we show Lemma 5. Although the reduction of 2-sided range max- 
ima queries at point q {RMQ{q) hereafter) to orthogonal planar point location 
is not new, the observation about the amount of priority information needed to 
answer RMQ is new (and essential). This lemma shows that although storing 
the permutation tt itself requires 0{n\gn) bits, the "effective entropy" [14] of 
the permutation with respect to 2-sided range maximum queries is much lower, 
generalizing the equivalent statement regarding ID range-maximum queries [12, 
21]. 

Lemma 5. Given a set S of n points from and relative priorities given as a 
permutation n on [n], the query RMQ{q) can be reduced to point location of q in 
a collection of at most n horizontal semi-open line segments, whose left endpoints 
are points from S, and whose right endpoints have x-coordinate equal to the x- 
coordinate of some point from S. Further, given at most 2(n — r) -|- Ig (") < 3n 
bits of extra information, the collection of line segments can be reconstructed 
from S, where r is the number of redundant points — those that are never the 
answer to any query — in S, and this bound is tight. 

Proof. Assume the points are in general position and that the 2-sided query is 
open to the top and left. Associate each point p = {x{p),y{p)) G S with a hor- 
izontal semi-open line of influence, possibly of length zero, whose left endpoint 
(included in the line) is p itself, and is denoted by Inf{p), and contains all points 
q such that y((i) = y{p), x{q) > x{p) and RMQ{q) = p. It can be seen that (see 
e.g. [17]) the answer to RMQ{q) for any g e can be obtained by shooting a 
vertical ray upward from q until the first line Inf(p) is encountered; the answer 
to RMQ{q) is then p (if no line is encountered then there is no point in the 
2-sided region specified by q). See Fig. 4 for an example. 




Fig. 4. (Left) Example for Lemma 5. The horizontal lines are the lines of influence. 
Vertical dotted lines show where a point has terminated the line of influence of another 
point. The arrow shows how point location in the lines of influence answers the 2-sidcd 
query with lower right hand corner at g, returning the point with priority 7. (Right) 
Example showing tightness of space bound — points along the diagonal are A( and the 
queries at p and q illustrate how ID RMQ queries can be answered and redundancy of 
elements tested, respecitvely. 



The set Inf{S) = {Inf{p)\p G S} can be computed by sweeping a vertical line 
from left to right. At any given position x — t of the sweep line, the sweep line will 
intersect Inf{S') for some set S' (initially S' = 0). If S' = pi^,. . . ,Pi^ such that 
y{Pii) > • . • > y{Pir.) then it follows that 7r(ji) < . . . < 7r(v) (the current lines of 
influence taken from top to bottom represent points with increasing priorities). 
Upon reaching the next point Ps such that y{Pij) < y{Ps) < y{Pij+i)j either (i) 
7r(s) < 7r{ij) — in this case Inf{ps) is empty — or (ii) 7r(fc) > 7r(ij). In the latter 
case, it may be that 7r(fc) > 7r(ij-|_i), . . .T:{ij^k) for some fc > 0, which would 
mean that Inf{pi._^_^), . . . , Inf{pi.^^) are terminated, with their right endpoints 
being x{ps). To construct Inf{S), therefore, only 0{n) bits of information are 
needed: for each point, one bit is needed to indicate whether case (i) or (ii) holds, 
and in the latter case, the value of k needs to be stored. However, k can be stored 
in unary using A; + 1 bits, and the total value of k, over the course of the entire 
sweep, is at most (n — r), giving a total of at most 2{n — r) bits. The bit-string 

that indicates whether case (i) or (ii) holds can be stored in Ig („" j,) < n bits. 

To show that this is tight, we consider the ID range maximum with redundant 
entries problem, defined as follows. Given an array A of size n, of which r 
entries arc redundant, answer the following two queries: firstly, given an index 
i, state whether A[i] is redundant, and secondly, given indices 1 < i < j < n, 
return the index of the largest value in a query interval . . . , A[j] (ignoring 
redundant values). Given n, r, it is easy to see that 2(n — r) + Ig (") — O(lgn) 
bits are required to encode the answers to the above queries, namely, to answer 
these queries without accessing A. This is because there are (") choices for the 



positions of the redundant elements, and for each choice of the positions of the 
redundant elements there are C„_r = n-r+i C^n-r^) P^'^tial orders among the 
values in A that can be distinguished by ID range maximum queries [12, 21]. We 
associate the values in A with a set of n points A( placed on a slanted line (in 
increasing a;-coordinate order) , and give each point in A^ a priority equal to the 
corresponding entry in A. The point associated with each redundant value in A 
is given a priority of — oo. In addition we place an n+ 1st point z that dominates 
all of the points in Ai (and hence is inchided in any query that also includes a 
point from A^), whose priority is greater than — oo but less than the smallest 
priority associated with a non-redundant entry of A (see Fig. 4(R)). The ID 
range maximum query problem with redundant entries can can be solved via 
2-sided 2D range maximum queries that include only z and the appropriate sub- 
range of Ai. Also, we can determine if an entry in A is redundant by making 
a 2-sided query that includes only the corresponding point from A^ and z; the 
point is redundant iff the answer to such a query is z. □ 

Remark 2. It can be shown that 2(n — r) -|- Ig (") < nlgS -|- o(n), which is at 
most 2.33n bits for large n. 

3.4 Data structures for 2-sided queries 

In this section we show the following lemma: 

Lemma 6. Given a recursive sub-problem of size n, we can answer 2-sided 
queries on this problem in 0{lgN) time using 0{n\/lgn) bits of space. 

This lemma assumes and builds upon the four "global'' data structures metioned 
after the statement to Theorem 1. We view the given problem of size n, on which 
we need to support 2-sided queries, as point location in a collection of 0{n) 
horizontal line segments as in Lemma 5, but with a limited space budget of 
0{ny/lgn) bits. We therefore need to devise an implicit representation of these 
problems. We begin with an overview of the process. Let T be the set of n points 
in the sub-problem we are considering. We start with an explicit representation 
of Inf{T) and choose a parameter A = ©(v^Igri). We select 0{n/X) lines of 
influence that partition the plane into rectangular regions with 0(A) points (from 
T) and parts of line segments (from Inf{T)) [3, 10] and store a standard point 
location data structure on the selected lines of influence. This data structure, 
called the skeleton, requires 0((n/A)lgn) = O (ny'lg n) bits. Furthermore, we 
store 0(A) bits of information with each region (including the 0(A)-bit encoding 
of priority information from Lemma 5). 

The query proceeds as follows. Given a query point q, we first perform a point 
location query on the skeleton to determine the region R in which q lies. We now 
need to reconstruct the original point location structure within R, and perform a 
slab-select to determine the points of T that lie within this region. This, together 
with the priority information, allows us to partially — but not fully, since lines of 
influence may originate from outside R — reconstruct the point location structure 
within R. To handle lines of influence starting outside R, we do a binary search 



with O(lgA) steps, where in each step we need to perform a slab-select, giving 
the claimed bound. The details are as follows. 

Preprocessing. As noted above, wc first create Inf{T), take A = y/\g n and 
select a set of points T' CT with the following properties: (a) \T'\ = 0{n/X); (b) 
the vertical decomposition, whereby we shoot vertical rays upward and downward 
from each endpoint of each segment in Inf(T') until they hit another segment 
(see Fig. 5), of the plane induced by Inf{T')^ decomposes the plane into 0{n/\) 
rectangular regions each of which has at most 0(A) points from T and parts 
of line segments from Inf{T) in it. T' always exists and can be found by plane 
sweep [3, Section 3], [10, Section 4.3]. The skeleton is any standard point location 
data structure on Inf{T'). 

Let R be any region, and let Left{R) {Right{R)) be the set of line segments 
from Inf{T) that intersect the left (right) boundaries of R, and let P{R) be the 
set of points from T in R. We store the following bit strings for R: 

1. For each line segment £ G Left{R), ordered top-to-down by y-axis, a bit 
that indicates whether the right endpoint of ^ is in i? or not; similarly for 
£ € Right{R), a bit indicating whether £ begins in R or not. 

2. If the left boundary of R is adjacent to other regions i?i,i?2,--- (taken 
from top to bottom) and h > represents the number of line segments 

from Left{R) that also intersect then we store a bit-string O'^^lO'^l A 

similar bit-string is stored for the right boundary of R. 

3. For each point in P{R) and each line segment in Left{R), a bit-string of 
length |P(i?)| -I- \Left{R)\ whose i-th bit indicates whether the i-th largest 
y-coordinate in P{R) U Left{R) is from P{R) or L{R). 

The purpose of (1) and (2) is to trace a line segment as it crosses multiple regions: 
if a line segment crosses from a region R' to a region R" on its right, then given 
its position in Right {R'), we can deduce its position in Left{R") and vice- versa. 
However, there is no useful bound on the number of regions a single line segment 
may cross, so we store the following information: 

4. Suppose that a line segment £ = Inf{p) for some p G T crosses m > A 
regions. Then, in every Ath region that £ crosses, we explicitly store the 
region containing p, and p's local coordinates. As in (1), for each region R, 
we store one bit for each £ e Right{R), £ = Inf{p) , indicating whether or 
not R holds information about p. 

5. Finally, for each point p G P{R), wc store the sequence of bits from Lemma 5, 
which indicates whether p has a non-empty Inf{p) and if so, for how many 
lines from Left(R) U Inf{P{R)), p is a right endpoint (p cannot be a right 
endpoint of any other line in Inf{T), by the construction of the skeleton). 

^ Note that the extent of a line segment in Inf{T') is defined, as originally, wrt points 
in T, and not wrt the points in T'. 



As noted previously, the skeleton takes 0{n^/lgn) bits, within our budget. 
Wc now add up the space required for (l)-(5). By construction, the sum of 
\Left{R)\, \Right{R)\ and \P{R)\ is 0{n) summed over all regions R. The space 
bound for (1) and (3) is therefore 0{n) bits. The number of Is in the bit string 
of (3), summed over all regions, is 0(n/A), as there are 0{n/X) regions and the 
graph which indicates adjacency of regions is planar; the number of Os is 0{n) 
as before. The space used by (4) is 0(nVlg n) bits again, as for every 0{^/\gn) 
portions of line segments in the regions we store O(lgn) bits. Finally, the space 
used for (5) is 0{n) bits by Lemma 5. 

Query algorithm. Suppose that we are given a query point in a sub-problem 

of size n and need to answer RMQ{q) (assume that wc have q's local and top- 
level coordinates). The query algorithm proceeds as follows: 

(a) Do a planar point location in the skeleton, and find a region R in which the 
point q lies. Perform slab-select on R to get P{R)- 

(b) As we know how many segments from Left{R) lie vertically between any pair 
of points in P{R), when we are given the data in (5) above, we are able to 
determine whether the a;-coordinate of a given point p in P{R) is the right 
endpoint of a line from cither Left{R) or Inf{P{R)). Thus, we have enough 
information to determine Inf{p) for all p S P{R) (at least until the right 
boundary of R). Furthermore, for each line in Left{R) that terminates in R, 
we also know (the top-level coordinates of) its right endpoint. 

(c) Using the top-level coordinates of q, we determine the nearest segment from 
Inf{P{R)) that is above q. 

(d) Using the top-level coordinates of q we also find the set of segments from 
Left(R) whose right endpoints are not to the left of q. Let this set be Left* (R). 
Wc now determine the nearest segment from Left* (R) that is above q. Un- 
fortimatcly, although \Left*{R)\ = 0(A), since the segments in Left* (R) 
originate in points outside R, we do not have their y-coordinates. Hence, we 
need to perform the following binary search on Left*{R): 

(dl) Take the line segment i e Left*{R) with median y-coordinate, and sup- 
pose that £ = Inf{p). The first task is to find the region Rp containing 
p, as follows. Use (2) to determine which of the adjacent regions of R £ 
intersects, say this is R' . U £ ends in R' , or R' = Rp and we are done. 
Otherwise, use (1) to locate £ in Left{R') and continue. 

(d2) Once we have found Rp, we perform a slab-select on i?' to determine 
P{Rp), and sort P{Rp) by y-axis. Then we perform (c) above on P{Rp), 
thus determining which points of P{Rp) have lines of influence that reach 
the right boundary of Rp. Using this we can now determine the (top- 
level) coordinates of p. 

(d3) We compare the top-level ^/-coordinates of p and q and recurse. 

(e) We take the lower of the lines found in (d) and (e) and use it to return a 
candidate. Observe that we have the top-level coordinates of this candidate. 

We now derive the time complexity of a 2-sided query. Step (a) takes O(lgn) 
for the point location, and 0(lg A^/lglg AT) for the slab-select. Step (b) can 



be done in O(lgn) = 0{lgN) time by running the plane sweep algorithm of 
Lemma 5 (recall that |P(i?)| = 0(\/lg n) — a quadratic algorithm will suffice). 
Step (c) likewise can be done by a simple plane sweep in O(lgn) time. Step (dl) 
is iterated at most 0{\/\gn) times before Rp is found since every A-th region 
intersected by £ contains information about p. Each iteration of (dl) takes 0(1) 
time: operations on the bit-strings are done either by table lookup if the bit- 
string is short (0(A) bits), or else using rank and select operations [20], if the 
bit string is long (as e.g. the bit-string in (2) may be) - these entirely standard 
tricks are not described in detail. Step (d2) takes 0(lgA^/lglg A'') time as before. 
Steps (dl)-(d3) are performed O(lgA) = 0(lglg A^) times, so this takes 0(lg A^) 
time overall. Step (e) is trivial. We have thus shown Lemma 6. 

3.5 Putting things together 

As noted in Section 3.1, the space usage of our data structure is 0{N) words. 
Coming to the running time, we solve 0(lglg A'') 2-sided queries using Lemma 6, 
giving a time of 0(lg7Vlglg A^). The 0(lglg Af) square- aligned queries arc solved 
in 0(1) time each. The 0(1) terminal problems at the bottom of the recursion 
are solved using Chazelle's algorithm in 0((lglg AT)^) time. Any candidate given 
in local coordinates is converted to top-level coordinates in 0(lg N/ Ig Ig N) time, 
or O(lgA^) time overall. We simply sequentially scan all O(lglgA^) candidates 
to find the answer. This proves Theorem 1. 

4 A succinct index for 2-sided queries 

In this section, we give a succinct index for 2-sided range maxima queries over 
A^ points in rank space. This is in essence a stand-alone variant of Lemma 6 
and reuses its structure. As noted earlier, we consider the case where the point 
coordinates are stored "elsewhere" and are assumed to be accessible in one of 

two ways, repeated here for convenience: 

— Through an orthogonal range reporting query. Here, we assume that a query 
that results in k points being reported takes T{N, k) time. We assume that 
T is such that T{N, 0{k)) = 0{T{N, k)). 

— The permuted-point model of [4], where we assume that the point coor- 
dinates are stored in read-only memory, permitting random access to the 
coordinates of the i-th point. However, the ordering of the points is specified 
by the data structure. 

Note that the priority information, unlike the coordinates, is encoded within the 
index. 

4.1 Succinct index in the orthogonal range reporting model 

Lemma 7. Let X > 2 be some parameter. There is a succinct index of size 
0{N +{N\gN)/\)) hits such that RMQ{q) queries can be answered in 0{\gN + 
lgA(A + T(Ar,A)) time. 



Proof. The proof follows closely the proof of Lemma 6, except that the distinc- 
tion between local and top-level coordinates vanishes, and the slab-select oper- 
ation is replaced by the assumed orthogonal range reporting query. The space 
complexity of the skeleton and associated bit-strings is exactly as in Lemma 6. 
For the time complexity, the planar point location to find the region R contain- 
ing the query point q is 0(lgA'') time. Finding P{R) takes T{N,\) time, and 
each of the 0(lg A) iterations of the binary search takes 0(A-|-r(Af, A)) time. □ 

Choosing A = IgA^ in the above, wc get: 

Corollary 1. There is a succinct index ofO{N) bits such that RMQ{q) queries 
can be answered in 0{\glgN ■ (IgTV -h T{N,\gN))) time. 

4.2 Succinct index in the permuted-point model 

Lemma 8. There is a succinct index of Nlg5 + o{N) = 2.33A'' + o{N) bits 
for answering RMQ{q) queries on a set of N points in 0(lglgA'') time in the 

permuted-point model. 

Proof, (sketch) Let S be the set of input points. We again solve RMQ{q) queries 
by planar point location on Inf{S). We consider the regions of the vertical de- 
composition induced by Inf{S) as nodes in a planar graph (the dual graph of 
the set of regions), with edges between adjacent cells (see Fig 5). Using the pla- 




Fig. 5. Vertical decomposition of the plane induced by a collection of line segments, 
and a part of the dual graph. 

nar separator theorem, for any parameter A > 1, there is a collection of 0{n/X) 
cells such that the removal of this collection of cells this graph can be decom- 
posed into connected components of size O(A^) each (this approach is used by 
Bose et at [4] and Chan and Patrascu [10] for example). As in [4] wc use a 
two- level decomposition, first decomposing the vertical decomposition of Inf{S) 
using A = (Ig A'')^, and then decomposing the connected components themselves 



using A' = (lglg-/V)^. The key difference to Lemma 6 is that the boundaries of 
the cells can be relatively compactly described. For instance, for each separator 
cell in the top-lcvcl decomposition, we can specify the points in S that define its 
four sides using O(lgiV) bits, and still use only o{N) bits. In the second-level 
decomposition, this information is stored in O(lglgA^) bits per separator cell 
(again o{N) bits overall), as the relevant point will either belong to the same 
top-level connected component of size 0((lg7V)'*), or else the relevant informa- 
tion will be stored in one of the top-level separator cells that form its boundary. 
The leading-order term comes from storing the Nlgb + o{N) bits of priority 
information needed to answer queries within the connected components. As in 
[4], we also permute the points in a manner aligned with the decomposition, 
which allows us to reconstruct the appropriate part of the planar point location 
rapidly. The planar point location in the top level is performed using Chan's 
data structure [8] taking 0{\glgN) time. Note that a third level of decomposi- 
tion, this time into connected components of size 0((lglglgiV)^) is needed to 
achieve "decompression" of the relevant parts of the point location structure in 
0{lglgN) time. □ 

5 Conclusions 

We have introduced a new approach to producing space-efficient data structures 
for orthogonal range queries. The main idea has been to partition the problem 
into smaller sub-problems, which are stored in a "compressed" format that are 
then "decompressed" on demand. We applied this idea to give the first linear- 
space data structure for 2D range maxima that improves upon Chazelle's 1985 
linear-space data structure. 
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